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(57) Abstract 



The present proposes new methods and an apparatus for enhancement of source coding systems utilising high frequency reconstruction 
(HFR). It addresses the problem of insufficient noise contents in a reconstructed highband, by Adaptive Noise-floor Addition. It also 
introduces new methods for enhanced performance by means of limiting unwanted noise, interpolation and smoothing of envelope adjustment 
amplification factors. The present invention is applicable to both speech coding and natural audio coding systems. 
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ENHANCING PERCEPTUAL PERFORMANCE OF SBR AND RELATED HFR 
CODING METHODS BY ADAPTIVE NOISE-FLOOR ADDITION AND NOISE 
SUBSTITUTION LIMITING 



TECHNICAL FIELD 

The present invention relates to source coding systems utilising high frequency reconstruction (HFR) such as 
Spectral Band Replication, SBR [WO 98/57436] or related methods. It improves performance of both high quality 
methods (SBR), as well as low quality copy-up methods [U.S. Pat. 5,127,054]. It is applicable to both speech coding 
1 0 and natural audio coding systems. Furthermore, the invention can beneficially be used with natural audio codecs 
with- or without high-frequency reconstruction, to reduce the audible effect of frequency bands shut-down usually 
occurring under low bitrate conditions, by applying Adaptive Noise-flbor Addition. 
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BACKGROUND OF THE INVENTION 

The presence of stochastic signal components is an important property of many musical instruments, as well as the 
human voice. Reproduction of these noise components, which usually are mixed with other signal components, is 
crucial if the signal is to be perceived as natural sounding. In high-frequency reconstruction it is, under certain 
conditions, imperative to add noise to the reconstructed high-band in order to achieve noise contents similar to the 
original. This necessity originates from the fact that most harmonic sounds, from for instance reed or bow 
instruments, have a higher relative noise level in the high frequency region compared to the low frequency region. 
Furthermore, harmonic sounds sometimes occur together with a high frequency noise resulting in a signal with no 
similarity between noise levels of the highband and the low band. In either case, a frequency transposition, i.e. high 
quality SBR, as well as any low quality copy-up-process will occasionally suffer from lack of noise in the replicated 
highband. Even further, a high frequency reconstruction process usually comprises some sort of envelope adjustment, 
25 where it is desirable to avoid unwanted noise substitution for harmonics. It is thus essential to be able to add and 
control noise levels in the high frequency regeneration process at the decoder. 

Under low bitrate conditions natural audio codecs commonly display severe shut down of frequency bands. This is 
performed on a frame to frame basis resulting in spectral holes that can appear in an arbitrary fashion over the entire 
coded frequency range. This can cause audible artifacts. The effect of this can be alleviated by Adaptive Noise-floor 
Addition. 



30 



Some prior art audio coding systems include means to recreate noise components at the decoder. This permits the 
encoder to omit noise components in the coding process, thus making it more efficient. However, for such methods 
35 to be successful, the noise excluded in the encoding process by the encoder must not contain other signal 

components. This hard decision based noise coding scheme results in a relatively low duty cycle since most noise 
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components are usually mixed, in time and/or frequency with other ««m,i 
SUMMARY OF THE INVENTION 

spectral representation of the original signal; PP 

- At a„ e oder mg ^ nQ . ^ ^ ^ ^ 

other polynomial representation; «"ng ^ or any 

- At an encoder or decoder, smoothing the noise-floor level in time and/or frequency 

- At a decoder, shaping random noise in accordance to a spectra, envelope representation of the original signal 
and adjusting the noise in accordance to the noise-floor leva, estimated ,n the encoder- 

- At a decoder, smoothing the noise level in time and/or frequency- 

At a decoder, applying smoothing to me envdope ^ 

At a decoder generating a high-frequency reconstmcted signal winch is the sum of several lugh-frequency 



BRIEF DESCRIPTION OF THE DRAWINGS 

' In re,erence to the accompanying drawings, in which- 
tne no,se-floor to frequency bands, according to the present invention- 

£ 3 it:;;:; imo ° ,hins m ta - d ^ • - - — — 

ng. j illustrates the spectrum of an original input signal; 
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Fig. 4 illustrates the spectrum of the output signal from a SBR process without Adaptive Noise-floor Addition; 
Fig. 5 illustrates the spectrum of the output signal with SBR and Adaptive Noise-floor Addition, according to the 
present invention; 

Fig. 6 illustrates the amplification factors for the spectral envelope adjustment filterbank, according to the present 
invention; 

Fig. 7 illustrates the smoothing of amplification factors in the spectral envelope adjustment filterbank, according to 
the present invention; 

Fig. 8 illustrates a possible implementation of the present invention, in a source coding system on the encoder side; 
Fig. 9 illustrates a possible implementation of the present invention, in a source coding system on the decoder side. 



DESCRIPTION OF PREFERRED EMBODIMENTS 

The below-described embodiments are merely illustrative for the principles of the present invention for improvement 
of high frequency reconstruction systems. It is understood that modifications and variations of the arrangements and 
the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by 
the scope of the impending patent claims and not by the specific details presented by way of description and 
explanation of the embodiments herein. 



Noise-floor level estimation 

When analysing an audio signal spectrum with sufficient frequency resolution, formants, single sinusodials etc. are 
clearly visible, this is hereinafter referred to as the fine structured spectral envelope. However, if a low resolution is 
used, no fine details can be observed, this is hereinafter referred to as the coarse structured spectral envelope. The 
level of the noise-floor, albeit it is not necessarily noise by definition, as used throughout the present invention, refers 
to the ratio between a coarse structured spectral envelope interpolated along the local minimum points in the high 
resolution spectrum, and a coarse structured spectral envelope interpolated along the local maximum points in the 
high resolution spectrum. This measurement is obtained by computing a high resolution FFT for the signal segment, 
and applying a peak- and dip-follower, Fig. 1. The noise-floor level is then computed as the difference between the 
peak- and the dip-follower. With appropriate smoothing of this signal in time and frequency, a noise -floor level 
measure is obtained. The peak follower function and the dip follower function can be described according to eq. 1 
and eq. 2, 

Y dip (X(k)) = min(Y(X(k-l)) + T,X(k)) V \<k<$^ eq . 2 

2 

where T is the decay factor, and X(k) is the logarithmic absolute value of the spectrum at line k. The pair is calculated 
for two different FFT sizes, one high resolution and one medium resolution, in order to get a good estimate during 
vibratos and quasi-stationary sounds. The peak- and dip-followers applied to the high resolution FFT are LP-filtered 
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.order to dtscard extreme values. After obtaining the two no.se-floor ,evel estate, the ,argest is chosen Inone 
.mplementanon of the present invention the noise-floor level values are mapped to multiple frequency bands 

oZT 0th Tr 6S C0UW 3,80 ^ "* ™ fitt ' n8POl «-^PCcoefncients.Itshouldbepoi„ted 
out that several Afferent approaches could be used when determining the noise contents in an audio signal. However 

.Us as descnedabov^oneobjectiveofthisinventioMo estimate the difference between local mimma and maxima 
» a Ingh-resolunon spectrum, albeit this is not necessan.y an accurate measurement of the true noise-.eve, Other 
emethods are linear prediction, ^correlation etc, these are commonly used m hard decis.on noise/no no.se 
gonthms [ Improvmg Audio Codecs by No.se Substitution" D. Schultz, JAES, Vol. 44, No. 7/8, , 996, Althoug h 
hese methods strive to measure the amount of true noise in a signal, mey are applicable for measurmg a „oi se . floo , 
level as defined in the present mvention, albeit not g.ving equally good results as the method outlined above It is 
also po SS1 ble to use an analysis by synthes.s approach, i.e. havmg a decoder m the encoder and m thrs manner 
assessing a correct value of the amount of adaptive noise required. 

Adaptive Noise-floor Addition 

in order to apply the adapuve „o,e-„oor, , speca, envelope , „,„„„ „ f fc 

to h-, pnor ,„ ,d Juatag it t0 comc , ,„ eki accoidJii6 K ^ v>te rra _ ved ^ ae ( i 

possible to adjust the levels with an additional offset given in the decoder. 

in one decoder tmplementation „f the pr^n, invenu.n, the received n.ise.floo, , eve , s a ,e compared ,o an upper 
rn.. gtven ,„ the decode, mapped to s evera, f.lterban* channels and subseouently smoothed by LP tiltertng in both 
Z 7" 1 2 ^^ ,ed ^" d ^' S *«-^»^«.-«c,,„, a ,s le L,,eve, 

noiseLevelOc.h = ■,lh„ r „itn "f(k,l) 

!+»/(*,/) eq ' 3 



adjustFactor(k,l) = ^ 



where * tndices the frecuency line, , the d« tad* fo , elch ,„„.„„,„ 

specu.ru of an orig.na, sign,, eon,.„in g a very p,o„.„„ C ed for™, sh™ ,„ , ht ban , bu , Jch ,e s 
pronounced t^ "te highband. Processing this with SBR without Adaptive Noise-floor Addition yields a result 
.eeotdtng to F,g. 4. Here it is evident ,„„ although the for™, Mme ^ ^ 
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Fig. 5, where the noise-floor superimposed on the replicated highband is displayed. The benefit of Adaptive Noise- 
floor Addition is here very obvious both visually and audibly. 

Transposer gain adaptation 

5 An ideal replication process, utilising multiple transposition factors, produces a large number of harmonic 
components, providing a harmonic density similar to that of the original. A method to select appropriate 
amplification-factors for the different harmonics is described below. Assume that the input signal is a harmonic 
series: 

*(0 = Xa/Cos(27r/:/). eq . 5 

1 0 A transposition by a factor two yields: 

y(0 = Y. a i cos ( 2 x 27 5fr) • eq. 6 

Clearly, every second harmonic in the transposed signal is missing. In order to increase the harmonic density, 
harmonics from higher order transpositions, A/=3,5 etc, are added to the highband. To benefit the most of multiple 
harmonics, it is important to appropriately adjust their levels to avoid one harmonic dominating over another within 
an overlapping frequency range. A problem that arises when doing so, is how to handle the differences in signal 
level between the source ranges of the harmonics. These differences also tend to vary between programme material, 
which makes it difficult to use constant gain factors for the different harmonics. A method for level adjustment of 
the harmonics that takes the spectral distribution in the low band into account is here explained. The outputs from 
the transposes are fed through gain adjusters, added and sent to the envelope-adjustment filterbank. Also sent to this 
filterbank is the low band signal enabling spectral analysis of the same. In the present invention the signal-powers of 
the source ranges corresponding to the different transposition factors are assessed and the gains of the harmonics are 
adjusted accordingly. A more elaborate solution is to estimate the slope of the low band spectrum and compensate 
for this prior to the filterbank, using simple filter implementations, e.g. shelving filters. It is important to note that 
this procedure does not affect the equalisation functionality of the filterbank, and that the low band analysed by the 
filterbank is not re-synthesised by the same. 

Noise Substitution Limiting 

According to the above (eq. 5 and eq. 6), the replicated highband will occasionally contain holes in the spectrum. 
The envelope adjustment algorithm strives to make the spectral envelope of the regenerated highband similar to that 
of the original. Suppose the original signal has a high energy within a frequency band, and that the transposed signal 
displays a spectral hole within this frequency band. This implies, provided the amplification factors are allowed to 
assume arbitrary values, that a very high amplification factor will be applied to this frequency band, and noise or 
other unwanted signal components will be adjusted to the same energy as that of the original. This is referred to as 
unwanted noise substitution. Let 



20 
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=IPip-«^«J eq 7 

be the scale factors of the original signal at a given time, and 

the correspond** scale factors of the transposed of the two vectors regents sub-band 

energy normaHsed in tune and frequency. The required amplification factors for the spectral envelope adjustment 



filterbank is obtained as 




eq. 9 



By observing G i< is trivial to determine the frequency bands with unwanted noise substitution, since these exhibit 
much higher amplification factors than the others. The unwanted no.se substitution ,s thus easily avoided by 
applymg a limiter to the amplification factors, i.e. allowing them to vary freely up to a certain lirrut, gmiX The 
amplification factors using the noise-Iimiter is obtained by 

G tim =[min(g„ gnax ),..., minte,,^)]. eq 10 

However, this expression only displays the basic principle of the noise-limiters. Since the spectral envelope of the 
transposed and the original signal might differ significantly in both level and slope, it 1S not feasible to use constant 
values for grn2x . Instead, the average gain, defined as 




eq. 11 



» calculated and the amplificanon factors are allowed to exceed that by a certain amount. In order to take wMe-band 
leve, variations into account, ,t ,s also possible to divide the two vectors P, and P 2 into different sub-vectors and 
process them accordingly. In this manner, a very efficient noise limiter is obtained, without interfering with 'or 
confinmg, the functionality of the level-adjustment of the sub-band signals containing useful information ' 



Interpolation 



«.„ ,«,0„ „ es , imK of ^ ^ ^ wifcl lhe ft bmd c 

r-M » ,„ „ der ,„ obBm to loweM ^ b „ „ B f , . desirabk m 

nun, „ of s c,U faors n™^, wtich implies the „ s , gc „ _ , jret eroups of ^ chumeb 

U».„y ,s no., by ^ fc ftequmy ^ ae „ rdil , g „ , ^ ^ ^ 

*e - - *U op^e on , „U eibMt ctaimtl teis , by ^..^ ^ ftom _ ' 

ac,„, The Slm p, cs , i„ Krpolatk> „ ^ is ,„ ^ „ ery ^ 
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envelope, are used to calculate the amplification factors according to the above. There are two major advantages 
with this frequency domain interpolation scheme. The transposed signal usually has a sparser spectrum than the 
original. A spectral smoothing is thus beneficial and such is made more efficient when it operates on narrow 
frequency bands, compared to wide bands. In other words, the generated harmonics can be better isolated and 
controlled by the envelope adjustment filterbank. Furthermore, the performance of the noise limiter is improved 
since spectral holes can be better estimated and controlled with higher frequency resolution. 

Smoothing 

It is advantageous, after obtaining the appropriate amplification factors, to apply smoothing in time and frequency, 
in order to avoid aliasing and ringing in the adjusting filterbank as well as ripple in the amplification factors. Fig. 6 
displays the amplification factors to be multiplied with the corresponding subband samples. The figure displays two 
high-resolution blocks followed by three low-resolution blocks and one high resolution block. It also shows the 
decreasing frequency resolution at higher frequencies. The sharpness of Fig. 6 is eliminated in Fig. 7 by filtering of 
the amplification factors in both time and frequency, for example by employing a weighted moving average. It is 
important however, to maintain the transient structure for the short blocks in time in order not to reduce the transient 
response of the replicated frequency range. Similarly, it is important not to filter the amplification factors for the 
high-resolution blocks excessively in order to maintain the formant structure of the replicated frequency range. In 
Fig. 9b the filtering is intentionally exaggerated for better visibility. 

20 Practical implementations 

The present invention can be implemented in both hardware chips and DSPs, for various kinds of systems, for 
storage or transmission of signals, analogue or digital, using arbitrary codecs. Fig. 8 and Fig. 9 shows a possible 
implementation of the present invention. Here the high-band reconstruction is done by means of Spectral Band 
Replication, SBR. In Fig.8 the encoder side is displayed. The analogue input signal is fed to the A/D converter 801, 

25 and to an arbitrary audio coder, 802, as well as the noise-floor level estimation unit 803, and an envelope extraction 
unit 804. The coded information is multiplexed into a serial bitstream, 805, and transmitted or stored. In Fig. 9 a 
typical decoder implementation is displayed. The serial bitstream is de-multiplexed, 901, and the envelope data is 
decoded, 902, i.e. the spectral envelope of the high-band and the noise-floor level. The de-multiplexed source coded 
signal is decoded using an arbitrary audio decoder, 903, and up-sampled 904. In the present implementation SBR- 

30 transposition is applied in unit 905. In this unit the different harmonics, are amplified using the feedback information 
from the analysis filterbank, 908, according to the present invention. The noise-floor level data is sent to the 
Adaptive Noise-floor Addition unit, 906, where a noise-floor is generated. The spectral envelope data is interpolated, 
907, the amplification factors are limited 909, and smoothed 910, according to the present invention. The 
reconstructed high-band is adjusted 91 1 and the adaptive noise is added. Finally, the signal is re-synthesised 912 and 

35 added to the delayed 913 low-band. The digital output is converted back to an analogue waveform 914. 
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CLAIMS 



) 



1 " «— -*« ■>—• ^oency ..construction, where said « 

5 decode, Renting ,„ opet.tions performed after storage or ™' ' 

at said encoder, estimating the noise-floor level of an original sigmil 

" SaW de » d «' — i .-rdanee ,o , spec*,, envelope representation. »d adjnsting said 

noise ,„ accordance .o said noise-floor level estimated in said encoder, 

at said decode,, adding said noise to the high-frequency reconsttucted signal. 

3. A method according ,o claim 1, characterised in tha, said noise-floo, level is represented nsin, LPC „ 
polynomial representation. represented using LPC, or any othe, 

-LA - tod to Cairn ,. characterised in tha, said noise-floor ,eve, is estimated ns.ng dip- ,ndpe, k - 

followers apphed to. specnal representation of said origmalsigr.il. 

7. A method according to claim 1, characterised in that the spectral envelope of said hkh fre „ 
s-gnansadjustedusinglimi^orthe envelope adjustment amp,^^ 

8. A method according to claim 1, characterised in that the spectral envelope of said hiri, fr, 

s.gnal rs adjusted using interpolation. ' hrgh-frequency reconstructed 

9. A method according to claim 1, characterised in that the spectral envelope of said hiri, fr 
^>»^^ 

10. A method according to claim 1, characterised in that th P hi^ *v 

■ c( , f "enseaintnatthe high-frequency reconstruction generates a signal whirh 

. 1. An a p pa„ms for enhancement of some, coding systems ns „g high-fteone.cy reconsnuction where said 
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means for estimating the noise-floor level of an original signal; 

12. An apparatus for enhancement of source coding systems using high-frequency reconstruction, where said 
apparatus comprises a decoder, for decoding a coded signal encoded by an encoder, characterised by: 

means for shaping random noise in accordance to a spectral envelope representation, and adjusting said 
in accordance to said noise-floor level estimated in said encoder; 

means for adding said noise to the high-frequency reconstructed signal. 
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